Generating Detailed Spectral Libraries for Canine Proteomes Obtained from Serum and Urine

Domestic dogs (Canis lupus familiaris) are popular companion animals. Increase in medical expenses associated with them and demand for extending their lifespan in a healthy manner has created the need to develop new diagnostic technology. Companion dogs also serve as important animal models for non-clinical research as they can provide various biological phenotypes. Proteomics have been increasingly used on dogs and humans to identify novel biomarkers of various diseases. Despite the growing applications of proteomics in liquid biopsy in veterinary medicine, no publicly available spectral assay libraries have been created for the proteome of canine serum and urine. In this study, we generated spectral assay libraries for the two-representative liquid-biopsy samples using mid-pH fractionation that allows in-depth understanding of proteome coverage. The resultant canine serum and urine spectral assay libraries include 1,132 and 4,749 protein groups and 5,483 and 25,228 peptides, respectively. We built these complimentary accessible resources for proteomic biomarker discovery studies through ProteomeXchange with the identifier PXD034770.


Background & Summary
Disease diagnosis is important for the welfare of companion dogs, including enabling proper treatment and reducing medical expenses that may arise in the future [1][2][3] . Biomarker-driven research and development improves both the sensitivity and specificity of diagnosis in various dog diseases, which can contribute towards extending the lifespan of dogs [4][5][6][7] . In addition, short-lived dogs living in the human-like circumstances form an animal model suitable for understanding human diseases and support translational discoveries 8 . Longitudinal studies that use multiomics for analyzing different types of samples collected throughout the life cycle from various breeds of dogs are recently being conducted [8][9][10] . Collection of such large veterinary medical data sets provide an opportunity to find more information about health and diseases in dogs 11 .
The proteomics approach derived from mass spectrometry (MS) has been useful in the discovery of novel protein biomarkers by aiding in acquisition of quantitative protein information from complex biological samples; [12][13][14][15][16] it enables biomarker-based disease diagnosis, prognosis, and therapeutic monitoring. Deep proteomic spectral libraries created using fractionated samples are built through data-dependent acquisition (DDA) mass spectrometry and are mainly employed to analyze data-independent acquisition (DIA)-mass spectrometry of unfractionated samples 17 . Comprehensive spectral libraries are required for robust quantification of maximum number of proteins in complex samples.
Shotgun proteomics has been used in searching for protein biomarkers using canine serum and urine samples. Several studies on dilated cardiomyopathy 18 39 .
Publicly available spectral libraries for canine serum and urine proteome are necessary for conducting studies to discover the uses of new disease biomarkers. Serum contains certain high-concentration proteins (i.e., albumin) that make it difficult to measure low-concentration proteins using liquid chromatography-MS (LC-MS), which remains the primary reason that more than 1,000 canine proteins have not yet been identified in urine 40 . Therefore, individual spectral libraries for serum and urine are needed and could help in identifying more proteins.
Comprehensive serum and urine spectral libraries were built from 24 fractionated serum and urine proteome using a high-resolution Orbitrap mass spectrometer. The serum protein library includes 5,483 peptides mapped to 1,132 serum proteins (Supplementary Table 1). The urine protein library has 25,228 peptides mapped to 4,749 urinary proteins (Supplementary Table 2). We analyzed the characteristics of the two libraries using DIALib-QC 41 . Spectral libraries can be used to interpret DIA data collected on other instruments and for qualitative and quantitative analysis of peptides and proteins through spectral matching with DDA data. We deposited raw MS data from the instrument and spectral libraries with ProteomeXchange 42 Consortium (http://proteomecentral.proteomexchange.org) through the Proteomics Identification Database (PRIDE) 43  Canine serum and urine collection. The features of the dogs sampled in this study are summarized in Table 1. Data was collected from 82 dogs belonging to 19 breeds consisting of 44 neutered males, 28 spayed females, 5 females, and 5 males. Serum and urine samples were collected from 12 dogs with acute pancreatitis, 12 dogs with chronic pancreatitis, 18 dogs with lymphoma, and 40 healthy dogs. Blood was drawn from the cephalic vein, and sera were separated by centrifuging for 10 min. Urine samples were collected by free catch. The sample procedures followed the guidelines of the Institutional Animal Care and Use Committee (IACUC) of Seoul National University (Approval number, SNU-200701-6-2; Approval date, 2020.09.28) 44 .

Chronic pancreatitis Lymphoma
Bichon Frise  www.nature.com/scientificdata www.nature.com/scientificdata/ Serum protein sample preparation. Serum samples (40 μL) were passed through a MARS14 column (100 × 4.6 mm; Agilent Technology, Palo Alto, CA, USA) on a binary HPLC system (20 A Prominence; Shimadzu, Tokyo, Japan) to reduce 14 high serum proteins; the unbound part was quantified by BCA assay and only 100 μg of it was freeze-dried with a cold trap (CentriVap Cold Traps; Labconco, Kansas City, MO, USA).
Urine sample preparation. One milliliter of urine was lyophilized using a cold trap. Dried urine samples were resuspended in 100 μL of 5% SDS in 50 mM TEAB (pH 8.5) and 100 μL of protein was lyophilized after BCA quantification.
Peptide sample preparation. One hundred microgram of dried serum or urine sample was reconstituted with 400 μL of 5% SDS in 50 mM TEAB (pH 8.5). Dithiothreitol was added to a final concentration of 20 mM denatured sample, and the reaction was carried out at 95 °C for 10 min. The reduced sample was then placed in iodoacetamide at a final concentration of 40 mM and incubated for 30 min at 25 °C in the dark. Using  (Table 2). Each fraction was dried and stored at -80 °C.
Nano LC-MS/MS. Peptide mixtures were separated by using the Dionex UltiMate 3000 RSLC nano system (Thermo Fisher Scientific, Waltham, MA, USA). The dried sample was resuspended in 0.1% formic acid to a concentration of 1 μg/μL, and 5 μL of which was loaded on a C18 Pepmap trap column (20 mm × 100 μm i.d., 5 μm, 100 Å; Thermo Fisher Scientific) and separated with an Acclaim ™ Pepmap 100 C18 column (500 mm × 75 μm i.d., 3 μm, 100 Å; Thermo Fisher Scientific) over 200 min (250 nL/min) using a 0-48% acetonitrile gradient in 0.1% formic acid and 5% DMSO for 150 min at 50 °C. The LC was connected with a Q Exactive HF-X mass spectrometer (Thermo Fisher Scientific) with an EASY-Spray nano-ESI source. In a data-dependent mode, mass spectra were obtained with an automatic switch between a full scan with top 20 data-dependent MS/MS scans. Resolution was set to 60,000 at m/z 200, and target value of 3,000,000 for MS scan type was selected. The ion target value for MS/MS was set at 100,000 with a resolution of 15,000 at m/z 200. The maximum ion injection time was set to 100 ms for the full scan and 50 ms for MS2 scan. Isolation width was 1.7 m/z, and normalized collision energy was set at 27. Dynamic exclusion for measurements of repeated peptides was set for 40 s. Data processing. Software Spectronaut v14 (Biosynosis, Switzerland) was used for data processing with default settings to build spectral libraries with the UniProt SwissProt and TrEMBL integrated database (Canis lupus familiaris (Taxon ID 9615); downloaded on 05/04/2022; 113,977 entries including both reviewed (838) and unreviewed (113,139) entries). N-terminal acetylation and methionine oxidation were set as variable modifications, and cysteine carbamidomethylation was set as a fixed modification. We set the false discovery rate (FDR) <1% at the peptide-spectrum match (PSM), peptide, and protein levels, respectively. Spectral libraries for canine serum and urine proteomes were created using 24-DDA raw mass spectrometry data.

Data records
The two spectral libraries (.xls and .kit) produced from raw mass spectrometry data (.raw) and DIALib-QC evaluation reports for both were saved to the ProteomeXchange Consortium through PRIDE with the dataset identifier PXD034770 46 . The mass spectrometry files were named "Dog(name of the sample type)-(fraction number).raw". The spectral libraries were named "Dog(name of the sample type)_Library.kit" and imported to Spectronaut software. Individual DIALib-QC evaluation reports for the two spectral libraries were included in this process.

technical Validation
High-level standard assay libraries are needed for accurate quantification of peptides and proteins. To get as many peptides as possible, a library was constructed with DDA-MS datasets obtained by reducing sample complexity through mid-pH separation. The search engine performed FDR calculation through target-decoy method using Biognosys's Pulsar, and the levels of PSM, peptide, and protein group were controlled to 1% or less. To build the spectral library for the serum and urine proteome, library-wide FDR controls were applied to <1% at the three levels.
www.nature.com/scientificdata www.nature.com/scientificdata/ The canine serum spectral library contained 97,888 transitions, identifying 6,159 peptide precursors and representing 5,483 stripped peptides and 1,132 protein groups. The canine urinary spectral library included 347,749 transitions identifying 27,922 peptide precursors interpreting 25,228 stripped peptides and 4,749 protein entries. Comparison between the spectral library for each dog specimen showed that 589 protein groups were common. In addition, there were 543 and 4,160 protein groups that were uniquely identified in canine serum and urine, respectively (Fig. 1a). A total of 4,055 peptides were found to be common between the spectral library for each dog specimen. In addition, there were 1,428 and 21,173 peptides that were uniquely identified in canine serum and urine, respectively (Fig. 1b).
We evaluated the quality and features of the canine serum proteome library by DIALib-QC. When contrasting the retention times (RTs) between the 2+ and 3+ ion charge states of precursors with the identical peptide, the two RTs with an R 2 value of 1 showed almost identical values, proving the high level of chromatography (Fig. 2a). The normalized RT used here was calculated by the standard peptides in the spectral library. A higher proportion of y than of b ions (72.4 vs. 27.6%), that is previously known as higher-energy collisional dissociation fragmentation in Orbitrap 47 (Fig. 2b). In addition, >99.4% of peptide fragment ions in the library have 1+ or 2+ charge states (Fig. 2c). Precursor ions containing five or fewer fragment ions do not account for 5.7% of the library (Fig. 2d). In the library, 98% of the precursors lie between 400 and 1,200 m/z (Figs. 2e), and 98.5% of the precursor states of charge range from +2 to +4 (Fig. 2f). More than 99.6% of all peptides in the library have 30 or fewer sequences (Fig. 2g). Proteins with two or more peptides possess 63.3% of the total amount, of which proteins with five or more peptides constitute 26.9% of the proteins in the library (Fig. 2h).